# Energy efficient low static-power voltage level shifter





# Energy efficient low staticpower voltage level shifter

### Naeem Maroof, Muhammad Sohail, and Hyunchul Shin<sup>a</sup>

Department of Electronics and Communication Engineering, Hanyang University

55 Hanyangdaehak-ro, Sangnok-qu, Ansan, Gyeonggi-do, 426-791, Korea

a) shin@hanyang.ac.kr

Abstract: Level shifter circuits are necessary parts of modern SoCs, as they interface different voltage domain signals. This paper presents an energy-efficient level up shifter capable of converting sub-threshold input signal to higher levels. The proposed design uses the feedback mechanism of a regulated gate cascode to achieve energy efficient operation. Once the desired output level is achieved, large static current does not flow, and static power is further minimized using a transistor stack. Implementing in a 90-nm process, post-layout simulation results show that the proposed level shifter has a propagation delay of 21.2 ns, a total energy-per-transition of only 77.5 fJ, and a static power dissipation of 7.2 nW.

**Keywords:** level shifter, energy efficient, sub-threshold, low-power **Classification:** Integrated circuits

#### References

- [1] B. Zhai, S. Pant, L. Nazhandali, S. Hanson, J. Olson, A. Reeves, M. Minuth, R. Helfand, T. Austin, D. Sylvester and D. Blaauw: IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 17 (2009) 1127. DOI:10.1109/TVLSI.2008.2007564
- [2] S. Lutkemeier, T. Jungeblut, H.K.O Berge, S. Aunet, M. Porrmann and U. Ruckert: IEEE J. Solid-State Circuits 48 (2013) 8. DOI:10.1109/JSSC.2012.2220671
- [3] S. Henzler: Power Management of Digital Circuits in Deep Sub-Micron CMOS Technologies (Springer, Heidelberg, Germany, 2007) 24.
- [4] S. Lutkemeier and U. Ruckert: IEEE Trans. Circuits Syst. II, Exp. Briefs 57 (2010) 721. DOI:10.1109/TCSII.2010.2056110
- [5] S. N. Wooters, B. H. Calhoun and T. N. Blalock: IEEE Trans. Circuits Syst. II, Exp. Briefs 57 (2010) 290. DOI:10.1109/TCSII.2010.2043471
- [6] M. Lanuzza, P. Corsonello and S. Perri: IEEE Trans. Circuits Syst. II, Exp. Briefs 59 (2012) 922. DOI:10.1109/TCSII.2012.2231037
- [7] T.-H. Chen, J. Chen and L. T. Clark: J. Low Power Electron. 2 (2006) 251. DOI:10.1166/jolpe.2006.071
- [8] Y. Osaki, T. Hirose, N. Kuroki and M. Numa: IEICE Electron. Express 8 (2011) 890-896. DOI:10.1587/elex.8.890
- [9] S. R. Hosseini, M. Saberi and R. Lotfi: IEEE Trans. Circuits Syst. II,





Exp. Briefs 61 (2014) 753. DOI:10.1109/TCSII.2014.2345295

#### 1 Introduction

Multiple voltage domains are commonplace in modern system-on-chips (SoCs) where different digital blocks are assigned to different voltage domains to achieve better energy efficiency. Level shifter (LS) circuits are placed as an interface between a low voltage domain ( $V_{\rm DDL}$ ) and a high voltage domain ( $V_{\rm DDH}$ ) to convert a low voltage signal into higher levels. Sub-threshold operation has received recent attention as reduction of the supply voltage is one of the most effective knobs for low power battery operated devices [1, 2]. Design of an LS circuit that can reliably convert sub-threshold input levels into a full  $V_{\rm DDH}$  signal with a minimum power and delay penalty in deep-submicron technologies is challenging [3], as conventional designs either require an unreasonable amount of area or incur higher static and dynamic power dissipation.

A conventional LS (CLS) is a cross-coupled half-latch, as shown in Fig. 1(a).  $V_{IN}$  and  $V_{INB}$  are complementary input signals of the  $V_{DDL}$  voltage domain .When  $V_{IN}=V_{DDL}$ , the MN1 transistor pulls down node  $V_A$ , and thus  $V_Z$  node is charged by MP2 (because transistor MN2 is OFF as it is driven by  $V_{INB}$ ). However, for every transition of the input, there is a large contention between the pull-up and pull-down network. When  $V_{INB}$  is high  $(V_{DDL})$ , MN2 tries to pull down node  $V_Z$ , which, however, is being charged by the transistor MP2. Therefore, due to the contention, a large amount of current passes over an extended period of time, and even still the operation might fail. In a 90-nm process technology with  $V_{DDL}$  of 200 mV and  $V_{DDH}$  of 1 V, CLS requires NMOS transistors to be about 2400 times wider than the corresponding PMOS transistors to produce the correct output [4].

Several works have been reported to overcome the limitations of CLS. Zhai et al. cascaded four stages of CLS to convert a 200 mV signal into a 1.2 V signal [1]. However, generation of intermediate voltages requires voltage regulators that incur power and area penalty. Wooters et al. used only two CLS stages, where the first stage is connected to  $V_{DDH}$  through a diodeconnected NMOS device [5]. This design avoids intermediate supply voltages, but does not achieve high-speed performance. Multi-threshold technology has also been used to strengthen and weaken the pull-down and pull-up network by using low-threshold and high-threshold devices respectively [6]. However, use of multi-threshold is complex and expensive. Reducing the strength of pull-up devices by limiting their currents using a current-mirror [7] allows a wide input range, but requires a large static current through the reference path and hence has quite large static power. Wilson current-mirror based LS (WLS) is presented in [4], and is shown in Fig. 1 (b). WLS has significantly better speed performance and energy efficiency; however, the output node of the current-mirror floats for a high level of V<sub>IN</sub> which causes a significant





detrimental effect on the output buffer. An LS with logic-error correction (LSLEC), proposed in [8], uses the current generators (CGs) that supply current when the logic level of the output is not corresponding to the logic level of the input. Contention problem of LSLEC is discussed in [9] where they propose a new LS using CGs and solve the problem by feeding back the output signal. However, their design (without an inverter) is based on 12 transistors, and proper sizing is required to charge the internal nodes and reduce the power dissipations of the inverter (output of which is fed back to the CGs).

In this paper, we propose an energy-efficient voltage level up shifter that is capable of converting a sub-threshold input signal to higher levels. The proposed design uses a feedback mechanism of the regulated gate cascode to achieve energy efficient operation. Large static current does not flow when the rail-to-rail output is achieved, and the static power dissipation is further minimized using a transistor stack. We compare our design with the one proposed in [4]. The rest of paper is organized as follows. Section 2 describes the operation of the reference and the proposed design. Section 3 presents the post-layout results along with the variability analysis, and finally Section 4 concludes the paper.



Fig. 1: LS circuit schematics: (a) CLS (b) WLS (c) NLS.

#### 2 Proposed level shifter

The reference (WLS) and the proposed (NLS) level shifter circuit schematics are shown in Fig. 1(b) and Fig. 1(c), respectively. For WLS: when the input is  $V_{\rm DDL}$ , MN1 is ON and hence the current can pass through the left branch. MP2 copies the current via the current-mirror operation and charges the node  $V_{\rm Z}$ , as the transistor MN2 is OFF. As  $V_{\rm Z}$  reaches a higher value, MP3 turns OFF and hence, no further current passes through the left branch. However, as  $V_{\rm Z}$  is getting charged, the left branch current decreases, which in effect decreases the charging rate of  $V_{\rm Z}$ . This is due to the negative feedback operation as MP3 is controlled by  $V_{\rm Z}$ . When the input is low, no current can pass through the left branch because the transistor MN1 is OFF, and hence no current is copied via the MP2 transistor, and the node  $V_{\rm Z}$  is discharged





through the MN2 transistor. However, the leakage current of MP2 does not allow  $V_Z$  to be fully discharged as efficiently as it should be.

Our proposed level shifter consists of six mosfet devices. Transistors MN1 and MN2 are driven by the complementary input signals. MP1 and MP2 form a current-mirror with  $V_{SG}$  of MP2 being  $V_{DDH}$ - $V_{B}$ . Transistor MP3 is used to regulate the gate voltage of MP4, and forms the transistor stack for the left branch. When the input is high, i.e.  $V_{IN}=V_{DDL}$ , MN1 discharges the node V<sub>A</sub>, which turns ON the MP4 transistor, and hence the current in the right branch flows to charge the node V<sub>Z</sub>. Transistor MP1 copies the current in the left branch via the current-mirror operation. Once the node V<sub>Z</sub> gets charged, no further current can flow through MP4, and hence node V<sub>B</sub> settles at a higher value and in effect turns off the current-mirror. Thus, no more current flows in either of the branches. The static current is minimized as MP3 is also in the cut-off region. When the input is low (0 V), MN2 is ON and MN1 is OFF. Transistor MN2 starts discharging the node  $V_Z$ . At this point, MP4 conducts the current and the current-mirror turns ON. The current through the left branch charges the node V<sub>A</sub> and thus turns OFF the MP4 transistor. Once again, V<sub>B</sub> settles at a higher voltage such that V<sub>DDH</sub>-V<sub>B</sub> is less than |V<sub>THP</sub>|, and the node V<sub>Z</sub> is easily pulled down by MN2. Leakage current is minimized as the current-mirror and MP3 and MP4 transistors are OFF.

**Table** I: Mosfets aspect ratios' of main stages and buffer

NLS ( $\mu$ m/ $\mu$ m): MN1=MN2=MP1=MP2=0.24/0.1; MP3=MP4=0.12/0.1 WLS ( $\mu$ m/ $\mu$ m): MN1=MN2=MP1=MP2=0.24/0.1; MP3=0.12/0.1 Buffer INV1 ( $\mu$ m/ $\mu$ m): MN1=0.12/0.2; MP1=0.24/0.2 Buffer INV2 ( $\mu$ m/ $\mu$ m): MN2=0.24/0.2; MP2=0.48/0.2

#### 3 Results

The following describes the post-layout simulations of both the proposed and the reference LS circuits in 90-nm process technology. The value of  $V_{\rm DDH}$  is kept at 1 V and V<sub>DDL</sub> is varied from sub-threshold levels to 1 V. A buffer is added to the output of both level shifters to achieve rail-to-rail output and load the next stage. A capacitive load of 100 fF is used at the output of the buffer. The aspect ratio (W/L) of all the transistors is given in Table I. In the layout, each transistor is realized using multiple minimum-width fingers. The finished layout, including the main stage and the output buffer, of both LSs is shown in Fig. 2, with NLS having an area of 24.95  $\mu m^2$  and WLS occupying 21.38  $\mu m^2$  of an area. We evaluate both the LSs on the basis of the propagation delay  $(\tau_p)$ , total energy-per-transition  $(E_{tr})$ , energy-delayproduct (EDP), and static power dissipation (Ps). All of the results consider both the main LS structure and the output buffer. If the output duty cycle is not between 0.45 and 0.55, the LS is considered to have failed for that input signal. The design target is 200 mV input signal of 1 MHz frequency, with 10 ns of rise and fall time, and a 0.5 duty cycle.









Fig. 2: Layout of both LSs (including the buffer): (a) WLS (b) NLS.



Fig. 3: Output node voltage of the main conversion stages of both LSs.

Table II: Propagation delay of both LSs at different values of V<sub>DDL</sub>

| $V_{\mathrm{DDL}}(V)$ | $\tau_{\mathrm{pNLS}}(\mathrm{ns})$ | $\tau_{\mathrm{pWLS}}(\mathrm{ns})$ | $\frac{\tau_{pNLS}}{\tau_{pWLS}}$ |
|-----------------------|-------------------------------------|-------------------------------------|-----------------------------------|
| 0.2                   | 21.16                               | 19.625                              | 1.078                             |
| 0.3                   | 6.729                               | 6.472                               | 1.04                              |
| 0.4                   | 4.317                               | 4.38                                | 0.986                             |
| 0.5                   | 2.964                               | 3.13                                | 0.947                             |
| 0.7                   | 1.724                               | 2.344                               | 0.735                             |

Both of the LSs operate well below sub-threshold levels. The minimum input level for both LSs is 160 mV. Fig. 3 shows the output node voltage of main stages of both the LSs for a 200 mV input. As shown in the inset, the proposed LS achieves better performance for the rising and falling transitions. The improved output slew rate provides better energy efficiency. The output of the NLS achieves relatively higher (lower) levels compared with the reference design for logic high (low) level of input signal. This reduces the static power dissipation through the output buffer. Also, for NLS, at least two transistors of both the right and the left branch are OFF in either (high /low) case of the input, which further lowers the static power. Static power dissipation of NLS is 7.22 nW and 12.2 nW of WLS. The propagation delay of both the LSs against V<sub>DDL</sub> is presented in Table II. Compared with WLS, NLS exhibits 8% higher propagation delay at 200 mV. Propagation delay of NLS improves as  $V_{\rm DDL}$  increases, and NLS offers 26% lower propagation delay compared with WLS at 700 mV of the  $V_{\rm DDL}$ . Fig. 4 shows the  $E_{\rm tr}$  of both LSs in logarithmic scale on the left vertical-axis and the normalized  $E_{\rm tr}$ of NLS with respect to WLS (in percentage) on the right vertical-axis.







Fig. 4: The total (left-axis) and normalized (right-axis) E<sub>tr</sub> of both LSs.



Fig. 5: Propagation delay distributions through MC simulations.

Compared with WLS, NLS provides an  $E_{\rm tr}$  reduction of 17.7% at the design target and, on average, 12.9% over the entire range of  $V_{\rm DDL}$ . EDP of NLS at the design target is only 88.7% of the EDP offered by WLS. On average, the EDP of NLS is only 77.1% that of the WLS over the entire  $V_{\rm DDL}$  range. The  $E_{\rm tr}$  of both LSs is almost flat in the mid-range, while for higher values of  $V_{\rm DDL}$ , energy consumption increased for both LSs. Monte Carlo (MC) simulations were carried out to get an impression of the local and global variations. Delay histogram results for a 1000 point MC simulations are shown in Fig. 5 for two different values of  $V_{\rm DDL}$  (200 mV & 500 mV). For both values of  $V_{\rm DDL}$ , the normalized variance of NLS is lower than WLS which shows robust operation against variations.

## 4 Conclusion

An energy efficient and low static power voltage LS circuit is presented. The proposed circuit uses the feedback mechanism of the regulated gate cascode to achieve energy efficiency, and transistor stacking effect to minimize the static power. Compared with a state of the art LS, the proposed design offers considerable improvement in the EDP, and works well for  $V_{\rm DDL}$  as low as 160 mV. At design target (200 mV, 1 MHz frequency), the design offers 21.2 ns of delay with only 77.5 fJ of total  $E_{\rm tr}$  and a 7.2 nW of static power.

#### 5 Acknowledgments

Tools were provided by IDEC, Korea (idec.or.kr). The First and second authors are supported by HEC, Pakistan (hec.gov.pk).

